Picture for Victoria Krakovna

Victoria Krakovna

Google DeepMind

Gram: Assessing sabotage propensities via automated alignment auditing

Add code
May 28, 2026
Viaarxiv icon

Realistic honeypot evaluations for scheming propensity

Add code
May 28, 2026
Viaarxiv icon

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Add code
Jul 15, 2025
Figure 1 for Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Viaarxiv icon

Evaluating Frontier Models for Stealth and Situational Awareness

Add code
May 02, 2025
Viaarxiv icon

An Approach to Technical AGI Safety and Security

Add code
Apr 02, 2025
Viaarxiv icon

Evaluating Frontier Models for Dangerous Capabilities

Add code
Mar 20, 2024
Figure 1 for Evaluating Frontier Models for Dangerous Capabilities
Figure 2 for Evaluating Frontier Models for Dangerous Capabilities
Figure 3 for Evaluating Frontier Models for Dangerous Capabilities
Figure 4 for Evaluating Frontier Models for Dangerous Capabilities
Viaarxiv icon

Limitations of Agents Simulated by Predictive Models

Add code
Feb 08, 2024
Viaarxiv icon

Quantifying stability of non-power-seeking in artificial agents

Add code
Jan 07, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Power-seeking can be probable and predictive for trained agents

Add code
Apr 13, 2023
Figure 1 for Power-seeking can be probable and predictive for trained agents
Figure 2 for Power-seeking can be probable and predictive for trained agents
Viaarxiv icon